Method for efficient low power motion estimation of a video frame sequence

ABSTRACT

A method for efficient low power motion estimation of a digital video image is provided in which processing requirements are reduced based upon the content being processed. The method performs motion estimation of a current video image using a search window of a previous video image. The method may include forming mean pyramids of a reference macroblock and the search area and a full search at a lowest resolution. A number of candidate motion vectors (CMVs) propagated to lower levels may be dependent on a quantized average deviation estimate (QADE) of a current macroblock and the maximum distortion band obtained during training for that QADE value at that particular level. Training over a sequence may be triggered at the beginning of every sequence. This training technique may be used to determine the value of the maximum distortion band for all QADEs of the macroblocks occurring over the training frames.

FIELD OF THE INVENTION

[0001] The present invention relates to the field of video dataprocessing, and, more particularly, to an efficient low power motionestimation of a video frame sequence wherein processing requirements arereduced. For example, the invention is suitable for use in low bit ratecoding of compressed digital video images such as in coders using theH.263 source-coding model. Specifically, the present invention providesa method for adapting the computational complexity involved in themotion estimation operation which is required for generating the motionvectors used in video coding.

BACKGROUND OF THE INVENTION

[0002] The presence of multimedia capabilities on mobile terminals opensup a spectrum of applications, such as video-conferencing, videotelephony, security monitoring, information broadcast and other suchservices. Video compression techniques enable the efficient transmissionof digital video signals. Video compression algorithms take advantage ofspatial correlation among adjacent pixels to derive a more efficientrepresentation of the important information in a video signal.

[0003] The most powerful compression systems not only take advantage ofspatial correlation, but can also utilize temporal correlations amongadjacent frames to further boost the compression ratio. In such systems,differential encoding is used to transmit only the difference between anactual frame and a prediction of the actual frame. The prediction isbased on information derived from a previous frame of the same videosequence.

[0004] In motion compensation systems, motion vectors are derived bycomparing a portion (i.e., a macroblock) of pixel data from a currentframe to similar portions (i.e., search area) of the previous frame. Amotion estimator determines the closest match of the referencemacroblock in the present image using the pixels in the previous image.The criterion used to evaluate similarity is usually the mean absolutedifference between the reference macroblock and the pixels in the searcharea corresponding to that search position. The use of motion vectors isvery effective in reducing the amount of data to be transmitted.

[0005] The MPEG-4 simple profile which is intended for wireless videoapplications is representative of the current level of technology inlow-bit rate, error resilient video coding. From the viewpoint of systemdesign, all the proposed techniques have to be implemented in the highlypower constrained, battery operated environment. Hence, to prolongbattery life, system and algorithm parameters are preferably modifiedbased upon the data being processed.

[0006] The source coding model of MPEG-4 simple profile (which is basedin the H.263 standard) employs block-based motion compensation forexploiting temporal redundancy and discrete cosine transform forexploiting spatial redundancy. The motion estimation process iscomputationally intensive and accounts for a large percentage of thetotal encoding computations. Hence there is a need for developingmethods that accurately compute the motion vectors in a computationallyefficient manner.

[0007] The fixed size block-matching (FSBM) technique for determiningthe motion vectors is the most computationally intensive technique amongall known techniques, but it gives the best results as it evaluates allthe possible search positions in the given search region. Techniquesbased on the unimodal error surface assumption, such as the N-stepsearch and logarithmic search achieve a large fixed magnitude ofcomputational reduction irrespective of the contents being processed.But, the drop in peak signal noise ratio (PSNR) due to local minimaproblems leads to perceptible difference in visual quality, especiallyfor high activity sequences.

[0008] The multi-resolution motion estimation technique of finding themotion vectors is a computationally efficient technique compared to theFSBM algorithm. In this technique, coarse values of motion vectors areobtained by performing the motion vector search on a low-resolutionrepresentation of the reference macroblock and the search area. Thisestimate is progressively refined at higher resolutions by searchingwithin a small area around these coarse motion vectors (also referred toas candidate motion vectors, or CMVs) obtained from the higher level.

[0009] The number of candidate motion vectors propagated to the higherresolution images is usually fixed by the algorithm designer to be asingle number, irrespective of the sequence or the macroblockcharacteristics. Each CMV contributes to a determinate number ofcomputations. Hence, by using a prefixed number of CMVs, either the PSNRobtained may be low if a small number of CMVs are propagated or thecomputational complexity becomes large if many CMVs are propagated. In apower constrained environment, propagating many CMVs would reducebattery life. Hence, fixed solutions for multi-resolution motionestimation either have a high power requirement if PSNR is to bemaintained or may result in poor image quality when a fixed lowcomputational complexity technique is used.

SUMMARY OF THE INVENTION

[0010] An object of the present invention is to provide an efficient lowpower motion estimation of a video frame sequence in which processingrequirements are minimized while maintaining picture quality at desiredlevels.

[0011] Another object of the present invention is to provide a systemand method of scaling computations in the technique of multi-resolutionmean pyramid that is based upon the video content being processed.

[0012] Still another object of the present invention to provide a systemand method for reducing the computations required in determining themotion vectors associated with reference macroblocks having lowfrequency content over the macroblock.

[0013] To achieve the above objects the present invention provides amethod for minimizing computations required for compression of motionvideo frame sequences using multi-resolution level mean pyramidtechnique while maintaining at least a predetermined picture qualitylevel. This is done by dynamically adjusting the number of CMVspropagated to each higher resolution level.

[0014] More particularly, the method may include establishing arelationship between quantized values of frequency content of thereference macroblocks in the video frames against distortion levelsresulting from the mean pyramid averaging process, determining thefrequency content of each macroblock, and predicting the distortionresulting from mean pyramid generation over the frequency content usingthe relationship. The method may further include computing the limitingmean absolute difference (MAD) value for maintaining picture qualityusing the predicted distortion value, and propagating those CMVs whoseMAD value falls below the limiting MAD value.

[0015] The relationship may be established using a training sequence ofvideo frames including generating mean pyramids on the reference blocksand on the corresponding search area at each level, generating deviationpyramids for the reference block by computing the mean deviation of eachpixel at a given level from corresponding pixels at the lower level, andcomputing the average deviation estimate (ADE) at each resolution levelby averaging the deviation pyramid values at that level. Moreover, theADE value may be quantized to determine quantized ADE (QADE) for thecorresponding reference block, and corresponding MAD may be computed forall search positions at a lowest resolution level.

[0016] Establishing the relationship may also include propagating themaximum allowed number of CMVs corresponding to the lowest MAD values toa next higher resolution level, computing MAD values at search positionsaround the CMV positions obtained from lower resolution level, andidentifying those search positions in each level that correspond to theleast MAD obtained at the highest resolution level as the final motionvector position for that level and the corresponding MAD value as theMAD_(corr) for that level. Further, distortion may be computed as thedifference between MAD_(corr) and MAD_(min) at each level, and themaximum of the distortion values obtained at each level over alltraining frames corresponding to each QADE value in a look-up table maybe saved.

[0017] The frequency contents may be determined by computing quantizedaverage deviation estimate (QADE) for each macroblock in the videoframe. The distortion level may be predicted by extracting the estimateddistortion value corresponding to the frequency content using therelationship established during training.

[0018] The limiting MAD for each level may be equal to the minimumcomputed MAD at that level incremented by the predicted distortionvalue. The training sequence may be re-triggered whenever the frameaverage MAD variation (FAMV) over the sequence exceeds a predeterminedthreshold value over a few frames, where the FAMV is the differencebetween the frame average MAD (FAM) value for the current frame and thedelayed-N-frame average MAD value (DNFAM) for the previous N frames.Further, FAM may be the average of the averaged MAD values for all thereference macroblocks in a frame, and DNFAM may be the average of theFAM values for the previous N frames.

[0019] The QADE may be a value obtained after quantizing the average ofthe mean deviation of the mean pyramid values from the original pixelvalues, over the reference macroblock. The estimated distortion valuemay be obtained from a look-up table that matches QADE values topredicted distortion values.

[0020] The invention further relates to a system for minimizingcomputations required for compression of motion video frame sequencesusing a multi-resolution mean pyramid technique while maintaining atleast a predetermined picture quality level by dynamically adjusting thenumber of CMVs propagated to each higher resolution level. The systemmay include means or circuitry for establishing a relationship betweenquantized values of frequency content of the reference macroblocks inthe video frames against distortion levels resulting from the meanpyramid averaging process, means or circuitry for determining thefrequency content of each macroblock, and means or circuitry forpredicting the distortion resulting from mean pyramid generation overthe frequency content using the relationship. The system may alsoinclude means or circuitry for computing the limiting MAD value formaintaining picture quality using the predicted distortion value, andmeans or circuitry for propagating those motion vectors whose MAD valuefalls below the limiting MAD value.

[0021] The relationship may be established using a training sequence ofvideo frames. To this end, the system may further include means orcircuitry for generating mean pyramids on the reference blocks and onthe corresponding search area at each level, means or circuitry forgenerating deviation pyramids for the reference block by computing themean deviation of each pixel at a given level from corresponding pixelsat the lower level, and means or circuitry for computing the averagedeviation estimate (ADE) at each resolution level by averaging thedeviation pyramid values at that level. The system may further includemeans or circuitry for quantizing the ADE value as to determinequantized ADE (QADE) for the corresponding reference block, means orcircuitry for computing corresponding MAD for all search positions at alowest resolution level, means or circuitry for propagating the maximumallowed number of CMVs corresponding to the lowest MAD values to nexthigher resolution level, and means or circuitry for computing MAD valuesat search positions around the CMV positions obtained from a lowerresolution level.

[0022] Moreover, the system may further include means or circuitry foridentifying those search positions in each level that correspond to theleast MAD obtained at the highest resolution level as the final motionvector position for that level and the corresponding MAD value as theMAD_(corr) for that level, and means or circuitry for computingdistortion as the difference between MAD_(corr) and MAD_(min) at eachlevel. Means or circuitry for saving the maximum of the distortionvalues obtained at each level over all training frames corresponding toeach QADE value in a look-up table may also be included.

[0023] The frequency contents may be determined by means or circuitryfor computing QADE for each macroblock in the video frame. Thedistortion level may be predicted by means or circuitry for extractingthe estimated distortion value corresponding to the frequency contentusing the relationship. The limiting MAD value for each level may beobtained by means or circuitry for incrementing the minimum computed MADat that level by the predicted distortion value.

[0024] In addition, the training sequence may be re-triggered wheneverthe frame average MAD variation (FAMV) over the sequence exceeds apredetermined threshold value over a few frames, the FAMV beingdetermined by means or circuitry for computing the difference betweenthe frame average MAD (FAM) value for the current frame and thedelayed-N-frame average MAD value (DNFAM) for the previous N frames,where FAM is the average of the averaged MAD values for all thereference macroblocks in a frame, and DNFAM is the average of the FAMvalues for the previous N frames.

[0025] The QADE may be a value obtained using means or circuitry forquantizing the average of the mean deviation of the mean pyramid valuesfrom the original pixel values over the reference macroblock. Theestimated distortion value may be obtained by a look-up table thatmatches QADE values to predicted distortion values.

[0026] The present invention also provides a computer readable mediumcomprising computer readable program code stored on a computer readablestorage medium embodied therein for minimizing computations required forcompression of motion video frame sequences a using multi-resolutionlevel mean pyramid technique while maintaining at least a predeterminedpicture quality level. This may be done by dynamically adjusting thenumber of CMVs propagated to each higher resolution level.

[0027] The computer readable medium may include computer readableprogram code for implementing the steps of establishing a relationshipbetween quantized values of frequency content of the referencemacroblocks in the video frames against distortion levels resulting fromthe mean pyramid averaging process, determining the frequency content ofeach macroblock, and predicting the distortion resulting from meanpyramid generation over the frequency content using the relationship.Further, the code may be for implementing the steps of computing thelimiting MAD value for maintaining picture quality using the predicteddistortion value, and propagating those motion vectors whose MAD valuefalls below the limiting MAD value.

[0028] The relationship is established using a training sequence ofvideo frames, for which the program code may implement the steps ofgenerating mean pyramids on the reference blocks and on thecorresponding search area at each level, generating deviation pyramidsfor the reference block by computing the mean deviation of each pixel ata given level from corresponding pixels at the lower level, computingthe average deviation estimate (ADE) at each resolution level byaveraging the deviation pyramid values at that level, and quantizing theADE value to determine quantized ADE (QADE) for the correspondingreference block. Moreover, a corresponding MAD may be computed for allsearch positions at a lowest resolution level, the maximum allowednumber of candidate motion vectors (CMVs) corresponding to the lowestMAD values to next higher resolution level may be propagated, and MADvalues computed at search positions around the CMV positions obtainedfrom lower resolution level.

[0029] Additional relationship establishing steps implemented by theprogram code may include identifying those search positions in eachlevel that correspond to the least MAD obtained at the highestresolution level as the final motion vector position for that level andthe corresponding MAD value as the MAD_(corr) for that level, andcomputing distortion as the difference between MAD_(corr) and MAD_(min)at each level. Additionally, the maximum of the distortion valuesobtained at each level over all training frames corresponding to eachQADE value in a look-up table may be saved.

[0030] The frequency content may be determined by computing quantizedaverage deviation estimate (QADE) for each macroblock in the videoframe. The distortion level may be predicted by extracting the estimateddistortion value corresponding to the frequency content using therelationship. Further, the limiting MAD value for each level may beobtained by incrementing the minimum computed MAD at that level by thepredicted distortion value.

[0031] The training sequence may be re-triggered whenever the frameaverage MAD variation (FAMV) over the sequence exceeds a predeterminedthreshold value over a few frames. The FAMV may be determined by thecomputer readable program code means for computing the differencebetween the frame average MAD (FAM) value for the current frame and thedelayed-N-frame average MAD value (DNFAM) for the previous N frames,where FAM is the average of the averaged MAD values for all thereference macroblocks in a frame and DNFAM is the average of the FAMvalues for the previous N frames.

[0032] The QADE may be a value obtained by quantizing the average of themean deviation of the mean pyramid values from the original pixel valuesover the reference macroblock. The estimated distortion value may beobtained by reading from a look-up table that matches QADE values topredicted distortion values.

BRIEF DESCRIPTION OF THE DRAWINGS

[0033]FIG. 1 is a schematic diagram illustrating the mean pyramidgeneration process for the reference macroblock and the process ofperforming the multi-resolution motion estimation in accordance with thepresent invention.

[0034]FIG. 2 is a schematic block diagram of a motion estimator inaccordance with the present invention.

[0035]FIG. 3 includes waveform diagrams illustrating the variation ofaverage CMVs per frame for different frames across different levels andfor different sequences in accordance with the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0036] A diagrammatic view of the process of generating the mean pyramidis illustratively shown in FIG. 1. In equation form: $\begin{matrix}{{{x_{L}^{k}\left( {p_{1}q} \right)} = {{\left\lceil {\frac{1}{4}{\sum\limits_{u = {- 1}}^{0}{\sum\limits_{u = {- 1}}^{0}{x_{L - 1}^{k}\left( {{2p} + {u_{1}2q} + u} \right)}}}} \right\rceil 1} \leq L \leq 2}},{1 \leq p \leq \frac{\,^{N}H}{\,_{2}L}},{1 \leq q \leq \frac{\,^{N}v}{\,_{2}L}}} & (1)\end{matrix}$

[0037] where L denotes the pyramid level, k denotes the frame number, pand q are pixel positions, and N_H and N_V denote the horizontal andvertical frame size respectively. Mean absolute difference (MAD) ofpixel values is used as a measure to determine the motion vectors (MV)at each level L and is given by: $\begin{matrix}{{{{MAD}_{M,n}^{L,k}\left( {i,j} \right)} = {{\frac{1}{IJ}{\sum\limits_{i = j}^{i + I - 1}{\sum\limits_{j = j}^{j + J - 1}{{{{x_{L}^{k}\left( {i,j} \right)} - {x_{L}^{k - 1}\left( {{i + m},{j + n}} \right)}}}\quad I}}}} = {I/2^{L}}}},{J = {J/2^{k}}},{{- s_{L}} \leq m},{n \leq s_{L}}} & (2) \\{{{MV}_{L}^{k}\left( {i,j} \right)} = {\arg_{{{- d} = {\leq m}},{{n \leq d} =}}\min \quad {{MAD}_{m,n}^{L,k}\left( {i,j} \right)}}} & (3)\end{matrix}$

[0038] where m,n denote the search coordinates for the macroblock at theposition (i,j), s_L is the level dependent search range, and I, J denotethe macroblock height and width respectively.

[0039] FSBM is performed at the highest level of the mean pyramid todetect random motion and obtain a low-cost estimate of the motionassociated with the macroblock. This estimate is progressively refinedat lower levels by searching within a small area around the motionvector obtained from the higher level. This process is shown in FIG. 1where 2 CMVs are propagated from every level to the next lower level. Inequation form, $\begin{matrix}{{{MV}_{L}^{k}\left( {i,j} \right)} = {{2{{MV}_{L + 1}^{k}\left( {i,j} \right)}} + {\delta \quad {{MV}_{L}^{k}\left( {i,j} \right)}}}} & (4)\end{matrix}$

[0040] Since we determine the number of CMVs to be propagated based onthe frequency content in the reference macroblock, we need to estimatethis quantity. The deviation pyramid used to estimate the frequencycharacteristics of the macroblock being matched is defined as:$\begin{matrix}{{D_{L}^{k}\left( {p,q} \right)}\left\lceil {\frac{1}{4}{\sum\limits_{r = {- 1}}^{0}{\sum\limits_{o = {- 1}}^{0}{{{x_{L - 1}^{k}\left( {{{2p} + r},{{2q} + s}} \right)} - {x_{L}^{k}\left( {p,q} \right)}}}}}} \right\rceil} & (5)\end{matrix}$

[0041] The deviation pyramid measures the deviation of the mean from theoriginal pixel values. It is representative of the error introduced bythe process of averaging. To obtain a single quantity representative ofthe frequency content of the macroblock, we sum up the deviation pyramidvalues generated for the reference macroblock at each level. A referencemacroblock with low frequencies sums up to a small value, whereas thepresence of high frequencies results in a large aggregate.

[0042] The average deviation estimate (ADE) of the macroblock atposition (i,j) is given by $\begin{matrix}{{{ADE}_{L}^{k}\left( {i,j} \right)} = {\frac{1}{IJ}{\sum\limits_{r = 0}^{I - 1}{\sum\limits_{o - {0a}}^{J - 1}{D_{L}^{k}\left( {{i + r},{j + s}} \right)}}}}} & (6)\end{matrix}$

[0043] To estimate the content complexity characteristics, we define aterm called the distortion band which gives the difference between theminimum MAD found at a particular level and the MAD value correspondingto the correct motion vector position. This is given by:

band^(L)(i,j)=MAD _(corr) ^(L)(i,j)−MAD _(min) ^(L)(i,j)  (7)

[0044] This value, if known, can be used to determine the threshold MADvalue, and all motion vectors whose MAD falls below this value can bepassed to the lower level as CMVs. The distortion band value needs to bepredicted, and in accordance with the present invention the distortionband value is predicted using the ADE defined above. The relationshipbetween the distortion band value and the ADE can be non-linear for aparticular video sequence, and this relation is learned during training.

[0045] The relationship between the ADE values and the distortion bandis determined during the training phase as follows. During training, themaximum allowed number of CMVs are propagated between all adjacentlevels, and based on the final value of the motion vector the distortionband at each of the levels can be calculated as in equation (7). The ADEaxis is uniformly quantized, and the maximum value of the distortionband at each of the quantized ADE (QADE) values is stored in memory.

[0046] The schematic diagram of the proposed motion estimator isillustratively shown in FIG. 2. The functioning of each module isdescribed below.

[0047] 1) Distortion Band Estimator:

[0048] The distortion band estimator is operational during the trainingof the motion estimator. During the training phase, the number of CMVspassed between adjacent levels is kept at the maximum value for allmacroblocks at all levels. The distortion band estimator matches thedecimated value of the final motion vector with the CMVs passed betweenadjacent levels and hence determines the MAD of the correct solution ateach level. Based on the MAD of the correct solution and the bestsolution at every level, the distortion band is calculated as inequation (7). During normal operation the distortion band estimator isturned off.

[0049] 2) Distortion Band Predictor:

[0050] The distortion band predictor is look-up table driven. The tablestores the maximum value of the distortion band corresponding to each ofthe QADEs at levels 1 and 2 obtained during training. During normaloperation, the distortion band for the current reference macroblock ispredicted based on the QADE value of the macroblock using the maximumvalue obtained during training corresponding to that particular QADEvalue at that level.

[0051] 3) QADE:

[0052] For every reference macroblock, the QADE of the referencemacroblock is determined both for levels 1 & 2. Determining the QADE atlevels 1 and 2 involves the use of equations (5)-(6) followed by uniformquantization.

[0053] 4) Content (Complexity) Change Detector:

[0054] On-line learning is performed at the beginning of every sequence.The content change detector is used to detect content complexity changewithin a sequence. It uses the frame average MAD (FAM), delayed N frameaverage MAD (DNFAM) and frame average MAD variation (FAMV), which aredefined as follows: $\begin{matrix}{{{FAM}(l)} = \quad {\frac{1}{M}{\sum\limits_{i}{\sum\limits_{l}{{MAD}_{{m -},{n -}}^{01}\left( {i,j} \right)}}}}} \\{{{DNFAM}(k)} = \quad {\frac{1}{N}{\sum\limits_{i = {k - N - D}}^{k - D}{{FAM}(l)}}}} \\{{{FAMV}(k)} = \quad {{{FAM}(k)} - {{DNFAM}(k)}}}\end{matrix}$

[0055] Here, M represents the number of macroblocks per frame, and(m−,n−) represent the motion vectors corresponding to the best MADvalues for the macroblock at position (i,j). Re-training is initiatedwhenever the value of FAMV consistently crosses a preset threshold overa few frames. Delayed N frame averages are used to accurately determinelocal variations in FAM values. Computationally, determining FAM resultsin a single addition per macroblock. DNFAM and FAMV are computed onceevery frame. Hence, the computational overhead of this block is verylow.

[0056] In simulations, the maximum number of CMVs passed is fixed at 9.Search data is reused at all levels of the search pyramid. Therefinement range at levels 0 and 1 is fixed at ±1 along both axes. As aresult, the search area due to 2 CMVs at levels 1 and 0 can overlap amaximum of 3 out of the 9 positions. This event is detected when eitherthe x or y component of the two CMVs have the same value and thedifference in the other component value is unity. When such a conditionoccurs, the search area is reduced correspondingly to one of the CMVs toeliminate the redundant computations.

[0057] To estimate the speedup factor, it is assumed that the additionoperation involved in pyramid generation contributes to half the cost ofthe basic operation involved in MAD calculation, which involves theabsolute difference operation followed by addition. Simulation resultsare given in Tables 1 through 3 below. All PSNR values quoted are forthe Y-component of the image. TABLE 1 Average PSNR comparisons AdaptiveFrames/ Sequence N-Step Pyramid Training Carphone 32.24 (0.27) 31.74(0.04) 380/3 Foreman 31.63 (0.48) 31.79 (0.05) 400/4 Claire 42.94 (0.01)42.94 (0.01) 300/1 MissA 41.27 (0.07) 41.28 (0.06) 150/1 Mother 40.58(0.04) 40.57 (0.05) 300/1 Grandma 42.57 (0.00) 42.57 (0.00) 300/1Salesman 39.98 (0.05) 39.99 (0.04) 300/1 Football 23.98 (2.02) 25.93(0.07) 210/3 Flower 21.88 (2.12) 23.79 (0.14) 150/1

[0058] In Table 1, the figures in parenthesis denote the PSNR dropcompared to FSBM. The last column gives the total number of frames inthe sequence and the number of times training is initiated for thesequence. For the fast-action sequences Football and Flower, PSNR dropsdrastically for an N-step search whereas the proposed algorithmmaintains PSNR close to FSBM. Similar results are seen for Foreman andCarphone. TABLE 2 Computational complexity comparisons using averageCMVs and speedup factors Average CMVs Speed up w.t.r Sequence Level 1Level 0 FSBM n-step Carphone 3.22 4.14 22.23 1/1.41 Foreman 5.51 3.7320.50 1/1.53 Claire 1.82 1.48 43.54 1.40 MissA 3.62 4.88 19.92 1/1.56Mother 1.65 1.92 37.74 1.21 Grandma 2.43 2.18 32.79 1.05 Salesman 1.131.16 50.00 1.60 Football 5.25 4.63 21.70 1/1.48 Flower 3.37 3.16 27.531/1.16

[0059] Table 2 shows that the proposed algorithm scale computationsdepend on the complexity of the sequence. The reduction in averagecomputations per macroblock for the FSBM and the N-step algorithm due tothe macroblocks at the frame edges (which have smaller search regions)is taken into consideration while computing these speedup factors. Theclose PSNR match between FSBM and the proposed algorithm and the rangeof computational scaling validates the utility of content specifictraining.

[0060] The variation of average CMVs per frame for levels 1 and 0 fortwo sequences may be seen in FIG. 3. The discontinuities in the curvesdenote the retraining frames where 9 CMVs are passed for all macroblocksat both the levels 1 & 2. Depending on the content, there is a largevariation in the average CMV value within the sequence for the Foremansequence. The content change detector is able to determine the framepositions where content, and hence content complexity, changessignificantly and thus triggers retraining.

[0061] A method for eliminating computations for sequences with plainbackgrounds in accordance with the invention will now be described.Sequences with plain backgrounds generate a large number of searchpositions that give approximately the same MAD, and this leads to anincrease in the CMVs. Solutions to this problem of plain backgroundswhich have been proposed in prior art literature use a static/non-staticclassifier operating at the macroblock level in the FSBM framework. Thedrop in PSNR is highly dependent on the accuracy of the classifier thatmakes a static/non-static decision based on the similarity between thehigher order bits of the reference macroblock pixels and the exactlyoverlapping position of the search area. Two solutions provided inaccordance with the invention for this phenomenon are as follows.

[0062] The first solution is such that, when the QADE at both levels 1and 2 is zero and the best MAD at level 2 is less than a thresholdthreshold_level2, the CMVs passed to level 1 are limited to two. Whenthe QADE at level 1 is zero and the best MAD at level 1 is less than athreshold threshold_level1, one CMV is passed to level 0.

[0063] The second solution is similar to the first solution except thatunder the same conditions at level 1, the best motion vector isinterpolated at level 1 to obtain the final MV position. This completelyeliminates computations at level 0.

[0064] The above solutions are based on the reasoning that if thecurrent best matches are likely to give a high PSNR and the deviationvalues are low, then the final best solution is unlikely to besignificantly different from the best solution tracked at the currentlevel. TABLE 3 Speedup factors and PSNR drop with background eliminationSolution A Solution B Speedup PNSR Speedup SNR Sequence w.r.t. FSMB dropw.r.t. FSMB drop Claire 49.98 0.03 74.04 0.05 MissA 40.41 0.08 49.210.10 Mother 50.92 0.05 60.00 0.06 Grandma 51.72 0.02 62.31 0.03

[0065] As may be seen in Table 3, the speedup factor improvessignificantly for all sequences with plain backgrounds, with a smalldrop in PSNR compared to FSBM.

[0066] Those of skill in the art will therefore appreciate that numerousadvantages are provided by the present invention. For example,computations for some benchmark sequences are reduced by a factor ofaround 70 compared to the full-search block matching motion estimationalgorithm. Moreover, the reduction in computations does not lead to adrastic drop in PSNR, as seen with the use of N-step search. Rather, thePSNR from the above described method is maintained close to thatobtained from the FSBM algorithm.

That which is claimed is:
 1. A method for minimizing computationsrequired for compression of motion video frame sequences usingmulti-resolution mean pyramid technique while maintaining at least apredefined picture quality level by dynamically adjusting the number ofCandidate Motion Vectors (CMVs) propagated to each higher resolutionlevel comprising: establishing a relationship between quantized valuesof frequency content of the reference macroblocks in said video framesagainst distortion levels resulting from the mean pyramid averagingprocess; determining the frequency content of each said macroblock;predicting the distortion resulting from mean pyramid generation oversaid frequency content using said relationship; computing the limitingMean Absolute Difference (MAD) value for maintaining picture qualityusing said predicted distortion value; and propagating those CMVs whoseMAD value falls below said limiting MAD value.
 2. The method as claimedin claim 1, wherein said relationship is established using a trainingsequence of video frames, comprising the steps of: generating meanpyramids on the reference blocks and on the corresponding search area ateach level; generating deviation pyramids for said reference block bycomputing the mean deviation of each pixel at a given level fromcorresponding pixels at the lower level; computing the Average DeviationEstimate (ADE) at each resolution level by averaging said deviationpyramid values at that level; quantizing said ADE value as to determinequantized ADE (QADE) for the corresponding reference block; computingcorresponding MAD for all search positions at lowest resolution level;propagating the maximum allowed number of candidate motion vectors(CMVs) corresponding to the lowest MAD values to next higher resolutionlevel; computing MAD values at search positions around the CMV positionsobtained from lower resolution level; identifying those search positionsin each level that correspond to the least MAD obtained at the highestresolution level as the final motion vector position for that level andthe corresponding MAD value as the MAD_(corr) for that level; computingdistortion as the difference between MAD_(corr) and MAD_(min) at eachlevel; and saving the maximum of the distortion values obtained at eachlevel over all training frames corresponding to each QADE value in aLook-up table.
 3. The method as claimed in claim 1, wherein saidfrequency content is determined by computing Quantized Average DeviationEstimate (QADE) for each macroblock in said video frame.
 4. The methodas claimed in claim 1, wherein said distortion level is predicted byextracting the estimated distortion value corresponding to saidfrequency content using said relationship established during training.5. The method as claimed in claim 1, wherein said limiting MAD for eachlevel is equal to the minimum computed MAD at that level incremented bysaid predicted distortion value.
 6. The method as claimed in claim 2,wherein said training sequence is re-triggered whenever the FrameAverage MAD Variation (FAMV) over said sequence exceeds a pre-definedthreshold value over a few frames, said FAMV being the differencebetween the Frame Average MAD (FAM) value for the current Frame and theDelayed-N-Frame Average MAD value (DNFAM) for the previous ‘N’ frameswhere FAM is the average of the averaged MAD values for all thereference macroblocks is a frame and DNFAM is the average of the FAMvalues for the previous ‘N’ frames.
 7. The method as claimed in claim 3,wherein said QADE is a value obtained after quantizing the average ofthe mean deviation of the mean pyramid values from the original pixelvalues, over said reference macroblock.
 8. The method as claimed inclaim 4, wherein said estimated distortion value is obtained from alook-up table that matches QADE values to predicted distortion values.9. A system for minimizing computations required for compression ofmotion video frame sequences using multi-resolution mean pyramidtechnique while maintaining at least a pre-defined picture quality levelby dynamically adjusting the number of Candidate Motion Vectors (CMVs)propagated to each higher resolution level comprising: means forestablishing a relationship between quantized values of frequencycontent of the reference macroblocks in said video frames againstdistortion levels resulting from the mean pyramid averaging process;means for determining the frequency content of each said macroblock;means for predicting the distortion resulting from mean pyramidgeneration over said frequency content using said relationship; meansfor computing the limiting Mean Absolute Difference (MAD) value formaintaining picture quality using said predicted distortion value; andmeans for propagating those motion vectors whose MAD value falls belowsaid limiting MAD value.
 10. The system as claimed in claim 9, whereinsaid relationship is established using a training sequence of videoframes, comprising: means for generating mean pyramids on the referenceblocks and on the corresponding search area at each level; means forgenerating deviation pyramids for said reference block by computing themean deviation of each pixel at a given level from corresponding pixelsat the lower level; means for computing the Average Deviation Estimate(ADE) at each resolution level by averaging said deviation pyramidvalues at that level; means for quantizing said ADE value as todetermine quantized ADE (QADE) for the corresponding reference block;means for computing corresponding MAD for all search positions at lowestresolution level; means for propagating the maximum allowed number ofcandidate motion vectors (CMVs) corresponding to the lowest MAD valuesto next higher resolution level; means for computing MAD values atsearch positions around the CMV positions obtained from lower resolutionlevel; means for identifying those search positions in each level thatcorrespond to the least MAD obtained at the highest resolution level asthe final motion vector position for that level and the correspondingMAD value as the MAD_(corr) for that level; means for computingdistortion as the difference between MAD_(corr) and MAD_(min) at eachlevel; and means for saving the maximum of the distortion valuesobtained at each level over all training frames corresponding to eachQADE value in a Look-up table.
 11. The system as claimed in claim 9,wherein said frequency content is determined by means for computingQuantized Average Deviation Estimate (QADE) for each macroblock in saidvideo frame.
 12. The system as claimed in claim 9, wherein saiddistortion level is predicted by means for extracting the estimateddistortion value corresponding to said frequency content using saidrelationship.
 13. The system as claimed in claim 9, wherein saidlimiting MAD value for each level is obtained by means for incrementingthe minimum computed MAD at that level by said predicted distortionvalue.
 14. The system as claimed in claim 10, wherein said trainingsequence is re-triggered whenever the Frame Average MAD Variation (FAMV)over said sequence exceeds a pre-defined threshold value over a fewframes, said FAMV being determined by means for computing the differencebetween the Frame Average MAD (FAM) value for the current Frame and theDelayed-N-Frame Average MAD value (DNFAM) for the previous ‘N’ frameswhere FAM is the average of the averaged MAD values for all thereference macroblocks is a frame and DNFAM is the average of the FAMvalues for the previous ‘N’ frames.
 15. The system as claimed in claim11, wherein said QADE is a value obtained using means for quantizing theaverage of the mean deviation of the mean pyramid values from theoriginal pixel values, over said reference macroblock.
 16. The system asclaimed in claim 12, wherein said estimated distortion value is obtainedby means of a look-up table that matches QADE values to predicteddistortion values.
 17. A computer program product comprising computerreadable program code stored on computer readable storage mediumembodied therein for minimizing computations required for compression ofmotion video frame sequences using multi-resolution mean pyramidtechnique while maintaining at least a predefined picture quality levelby dynamically adjusting the number of Candidate Motion Vectors (CMVs)propagated to each higher resolution level comprising: computer readableprogram code means configured for establishing a relationship betweenquantized values of frequency content of the reference macroblocks insaid video frames against distortion levels resulting from the meanpyramid averaging process; computer readable program code meansconfigured for determining the frequency content of each saidmacroblock; computer readable program code means configured forpredicting the distortion resulting from mean pyramid generation oversaid frequency content using said relationship; computer readableprogram code means configured for computing the limiting Mean AbsoluteDifference (MAD) value for maintaining picture quality using saidpredicted distortion value; and computer readable program code meansconfigured for propagating those motion vectors whose MAD value fallsbelow said limiting MAD value.
 18. The computer program product asclaimed in claim 17, wherein said relationship is established using atraining sequence of video frames, comprising: computer readable programcode means configured for generating mean pyramids on the referenceblocks and on the corresponding search area at each level; computerreadable program code means configured for generating deviation pyramidsfor said reference block by computing the mean deviation of each pixelat a given level from corresponding pixels at the lower level; computerreadable program code means configured for computing the AverageDeviation Estimate (ADE) at each resolution level by averaging saiddeviation pyramid values at that level; computer readable program codemeans configured for quantizing said ADE value as to determine quantizedADE (QADE) for the corresponding reference block; computer readableprogram code means configured for computing corresponding MAD for allsearch positions at lowest resolution level, computer readable programcode means configured for propagating the maximum allowed number ofcandidate motion vectors (CMVs) corresponding to the lowest MAD valuesto next higher resolution level; computer readable program code meansconfigured for computing MAD values at search positions around the CMVpositions obtained from lower resolution level; computer readableprogram code means configured for identifying those search positions ineach level that correspond to the least MAD obtained at the highestresolution level as the final motion vector position for that level andthe corresponding MAD value as the MAD_(corr) for that level; computerreadable program code means configured for computing distortion as thedifference between MAD_(corr) and MAD_(min) at each level; and computerreadable program code means configured for saving the maximum of thedistortion values obtained at each level over all training framescorresponding to each QADE value in a Look-up table.
 19. The computerprogram product as claimed in claim 17, wherein said frequency contentis determined by computer readable program code means configured forcomputing Quantized Average Deviation Estimate (QADE) for eachmacroblock in said video frame.
 20. The computer program product asclaimed in claim 17, wherein said distortion level is predicted bycomputer readable program code means configured for extracting theestimated distortion value corresponding to said frequency content usingsaid relationship.
 21. The computer program product as claimed in claim18, wherein said limiting MAD value for each level is obtained bycomputer readable program code means configured for incrementing theminimum computed MAD at that level by said predicted distortion value.22. The computer program product as claimed in claim 19, wherein saidtraining sequence is re-triggered by computer readable program codemeans whenever the Frame Average MAD Variation (FAMV) over said sequenceexceeds a predefined threshold value over a few frames, said FAMV beingdetermined by said computer readable program code means for computingthe difference between the Frame Average MAD (FAM) value for the currentFrame and the Delayed-N-Frame Average MAD value (DNFAM) for the previous‘N’ frames where FAM is the average of the averaged MAD values for allthe reference macroblocks is a frame and DNFAM is the average of the FAMvalues for the previous ‘N’ frames.
 23. The computer program product asclaimed in claim 20, wherein said QADE is a value obtained by computerreadable program code means configured for quantizing the average of themean deviation of the mean pyramid values from the original pixelvalues, over said reference macroblock.
 24. The computer program productas claimed in claim 21, wherein said estimated distortion value isobtained computer readable program code means configured for readingfrom a look-up table that matches QADE values to predicted distortionvalues.
 25. A method for minimizing computations required forcompression of motion video frame sequences using multi-resolution meanpyramid technique while maintaining at least a predefined picturequality level by dynamically adjusting the number of Candidate MotionVectors (CMVs) propagated to each higher resolution level substantiallyas herein described with reference to and as illustrated in theaccompanying drawings.
 26. A system for minimizing computations requiredfor compression of motion video frame sequences using multi-resolutionmean pyramid technique while maintaining at least a predefined picturequality level by dynamically adjusting the number of Candidate MotionVectors (CMVs) propagated to each higher resolution level substantiallyas herein described with reference to and as illustrated in theaccompanying drawings.
 27. A computer program product for minimizingcomputations required for compression of motion video frame sequencesusing multi-resolution mean pyramid technique while maintaining at leasta predefined picture quality level by dynamically adjusting the numberof Candidate Motion Vectors (CMVs) propagated to each higher resolutionlevel substantially as herein described with reference to and asillustrated in the accompanying drawings.